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(57) Abstract: The present invention relates to a solution for improving the perceived sound quality of a decoded acoustic signal. 
THe improvement is accomplised by means of extending the spectrum of a received narrow-band acoustic signal (a NB ). According 
to the invention , a wide-band acoustic signal (a W B) is produced by extracting at least one essential attribute (Z NB ) from the narrow- 
band acoustic signal (a NB ). Parameters, e.g. representing signal energies, with respect to wide-band frequency components outside 
the spectrum (A NB ) of the narrow-band acoustic signal (a NB ) are estimated based on the at least one essential attribute (Z NB ).This 
estimation involves allocating a parameter value to a wide-band frequency component, based on a corresponding confidence level. 
For instance, a relatively high parameter value is allowed to be allocated to a frequency component if it has a comparatively high 
degree certainty. In contrast, a relatively low parameter value is only allowed to be allocated to a frequency component if it is 
associated with a comparatively low degree certainty. 
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Bandwidth Extension of Acoustic Signals 

5 THE BACKGROUND OF THE INVENTION AND PRIOR ART 

The present invention relates generally to the improvement of 
the perceived sound quality of decoded acoustic signals. More 
particularly the invention relates to a method of producing a 
wide-band acoustic signal on basis of a narrow-band acoustic 
10 signal according to the preamble of claim 1 and a signal decoder 
according to the preamble of claim 24. The invention also 
relates to a computer program according to claim 22 and a 
computer readable medium according to claim 23. 

Today's public switched telephony networks (PSTNs) generally 
15 low-pass filter any speech or other acoustic signal that they 
transport. The low-pass (or, in fact, band-pass) filtering 
characteristic is caused by the networks' limited channel band- 
width, which typically has a range from 0,3 kHz to 3,4 kHz. Such 
band-pass filtered acoustic signal is normally perceived by a 
20 human listener to have a relatively poor sound quality. For 
instance, a reconstructed voice signal is often reported to sound 
muffled and/or remote from the listener. 

The trend in fixed and mobile telephony as well as in video- 
conferencing is, however, towards an improved quality of the 
25 acoustic source signal that is reconstructed at the receiver end. 
This trend reflects the customer expectation that said systems 
provide a sound quality, which is much closer to the acoustic 
source signal than what today's PSTNs can offer. 
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One way to meet this expectation is, of course, to broaden the 
frequency band for the acoustic source signal and thus convey 
more of the information being contained in the source signal to 
the receiver. For instance, if a 0 - 8 kHz acoustic signal 
5 (sampled at 16 kHz) were transmitted to the receiver, the 
naturalness of a human voice signal, which is otherwise lost in a 
standard phone call, would indeed be better preserved. 
However, increasing the bandwidth for each channel by more 
than a factor two would either reduce the transmission capacity 
10 to less than half or imply enormous costs for the network 
operators in order to expand the transmission resources by a 
corresponding factor. Hence, this solution is not attractive from 
a commercial point-of-view. 

Instead, recovering at the receiver end, wide-band frequency 
15 components outside the bandwidth of a regular PSTN-channel 
based on the narrow-band signal that has passed through the 
PSTN constitutes a much more appealing alternative. The re- 
covered wide-band frequency components may both lie in a low- 
band below the narrow-band (e.g. in a range 0,1 - 0,3 kHz) and in 
20 a high-band above the narrow-band (e.g. in a range 3,4 - 8,0 
kHz). 

Although the majority of the energy in a speech signal is 
spectrally located between 0 kHz and 4 kHz, a substantial 
amount of the energy is also distributed in the frequency band 
25 from 4 kHz to 8 kHz. The frequency resolution of the human 
hearing decreases rapidly with increasing frequencies. The 
frequency components between 4 kHz and 8kHz therefore 
require comparatively small amounts of data to model with a 
sufficient accuracy. 

30 It is possible to extend the bandwidth of the narrow-band 
acoustic signal with a perceptually satisfying result, since the 
signal is presumed to be generated by a physical source, for 
instance, a human speaker. Thus, given a particular shape of 
the narrow-band, there are constraints on the signal properties 
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with respect to the wide-band shape. I.e. only certain 
combinations of narrow-band shapes and wide-band shapes are 
conceivable. 

However, modelling a wide-band signal from a particular narrow- 
5 band signal is still far from trivial. The existing methods for 
extending the bandwidth of the acoustic signal with a high-band 
above the current narrow-band spectrum basically include two 
different components, namely: estimation of the high-band 
spectral envelope from information pertaining to the narrow- 
10 band, and recovery of an excitation for the high-band from a 
narrow-band excitation. 

All the known methods, in one way or another, model 
dependencies between the high-band envelope and various 
features describing the narrow-band signal. For instance, a 

15 Gaussian mixture model (GMM), a hidden Markov model (HMM) 
or vector quantisation (VQ) may be utilised for accomplishing 
this modelling. A minimum mean square error (MMSE) estimate 
is then obtained from the chosen model of dependencies for the 
high-band spectral envelope provided the features that have 

20 been derived from the narrow-band signal. Typically, the 
features include a spectral envelope, a spectral temporal 
variation and a degree of voicing. 

The narrow-band excitation is used for recovering a corre- 
sponding high-band excitation. This can be carried out by simply 

25 up-sampling the narrow-band excitation, without any following 
low-pass filtering. This, in turn, creates a spectral-folded version 
of the narrow-band excitation around the upper bandwidth limit 
for the original excitation. Alternatively, the recovery of the high- 
band excitation may involve techniques that are otherwise used 

30 in speech coding, such as multi-band excitation (MBE). The 
latter makes use of the fundamental frequency and the degree 
of voicing when modelling an excitation. 
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Irrespective of how the high-band excitation is derived, the 
estimated high-band spectral envelope is used for obtaining a 
desired shape of the recovered high-band excitation. The result 
thereof in turn forms a basis for an estimate of the high-band 
5 acoustic signal. This signal is subsequently high-pass filtered 
and added to an up-sampled and low-pass filtered version of the 
narrow-band acoustic signal to form a wide-band acoustic signal 
estimate. 

Normally, the bandwidth extension scheme operates on a 20-ms 
10 frame-by-frame basis, with a certain degree of overlap between 
adjacent frames. The overlap is intended to reduce any 
undesired transition effects between consecutive frames. 

Unfortunately, the above-described methods all have one unde- 
sired characteristic in common, namely that they introduce 
15 artefacts in the extended wide-band acoustic signals. Further- 
more, it is not unusual that these artefacts are so annoying and 
deteriorate the perceived sound quality to such extent that a 
human listener generally prefers the original narrow-band 
acoustic signal to the thus extended wide-band acoustic signal. 

20 SUMMARY OF THE INVENTION 

The object of the present invention is therefore to provide an 
improved bandwidth extension solution for a narrow-band 
acoustic signal, which alleviates the problem above and thus 
produces a wide-band acoustic signal that has a significantly 
25 enhanced perceived sound quality. The above-indicated problem 
being associated with the known solutions is generally deemed 
to be due to an over-estimation of the wide-band energy 
(predominantly in the high-band). 

According to one aspect of the invention the object is achieved 
30 by a method of producing a wide-band acoustic signal on basis 
of a narrow-band acoustic signal as initially described, which is 
characterised by allocating a parameter with respect to a 
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particular wide-band frequency component based on a corre- 
sponding confidence level. 

According to a preferred embodiment of the invention, a 
relatively high parameter value is thereby allowed to be 
5 allocated to a frequency component if the confidence level 
indicates a comparatively high degree certainty. In contrast, a 
relatively low parameter value is allowed to be allocated to a 
frequency component if the confidence level indicates a 
comparatively low degree certainty. 

10 According to one embodiment of the invention, the parameter 
directly represents a signal energy for one or more wide-band 
frequency components. However, according to an alternative 
embodiment of the invention, the parameter only indirectly 
reflects a signal energy. The parameter then namely represents 

15 an upper-most bandwidth limit of the wide-band acoustic signal, 
such that a high parameter value corresponds to a wide-band 
acoustic signal having a relatively large bandwidth, whereas a 
low parameter value corresponds to a more narrow bandwidth of 
the wide-band acoustic signal. 

20 According to a further aspect of the invention the object is 
achieved by a computer program directly loadable into the 
internal memory of a computer, comprising software for 
performing the method described in the above paragraph when 
said program is run on a computer. 

25 According to another aspect of the invention the object is 
achieved by a computer readable medium, having a program 
recorded thereon, where the program is to make a computer 
perform the method described in the penultimate paragraph 
above. 

30 According to still another aspect of the invention the object is 
achieved by a signal decoder for producing a wide-band 
acoustic signal from a narrow-band acoustic signal as initially 
described, which is characterised in that the signal decoder is 
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arranged to allocate a parameter to a particular wide-band 
frequency component based on a corresponding confidence 
level. 

According to a preferred embodiment of the invention, the 
5 decoder thereby allows a relatively high parameter value to be 
allocated to a frequency component if the confidence level 
indicates a comparatively high degree certainty, whereas it 
allows a relatively low parameter value to be allocated to a 
frequency component whose confidence level indicates a 
10 comparatively low degree certainty. 

In comparison to the previously known solutions, the proposed 
solution significantly reduces the amount of artefacts being 
introduced when extending a narrow-band acoustic signal to a 
wide-band representation. Consequently, a human listener 
15 perceives a drastically improved sound quality. This is an 
especially desired result, since the perceived sound quality is 
deemed to be a key factor in the success of future tele- 
communication applications. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is now to be explained more closely by 
means of preferred embodiments, which are disclosed as 
examples, and with reference to the attached drawings. 

Figure 1 shows a block diagram over a general signal decoder 
according to the invention, 

Figure 2 exemplifies a spectrum of a typical acoustic source 
signal in the form of a speech signal, 

Figure 3 exemplifies a spectrum of the acoustic source signal 
in figure 2 after having been passed through a 
narrow-band channel, 
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Figure 4 exemplifies a spectrum of the acoustic signal 
corresponding to the spectrum in figure 3 after having 
been extended to a wide-band acoustic signal 
according to the invention, 

5 Figure 5 shows a block diagram over a signal decoder 
according to an embodiment of the invention, 

Figure 6 illustrates a narrow-band frame format according to 
an embodiment of the invention, 

Figure 7 shows a block diagram over a part of a feature 
10 extraction unit according to an embodiment of the 

invention, 

Figure 8 shows a graph over an asymmetric cost-function, 
which penalizes over-estimates of an energy-ratio 
between the high-band and the narrow-band 
15 according to an embodiment of the invention, and 

Figure 9 illustrates, by means of a flow diagram, a general 
method according to the invention. 



DESCRIPTION OF PREFERRED EMBODIMENTS OF THE 
INVENTION 

20 Figure 1 shows a block diagram over a general signal decoder 
according to the invention, which aims at producing a wide-band 
acoustic signal a W B on basis of a received narrow-band signal 
a NB , such that the wide-band acoustic signal a WB perceptually 
resembles an estimated acoustic source signal a SOU rce as much 

25 as possible. It is here presumed that the acoustic source signal 
a SO urce has a spectrum A SO urce, which is at least as wide as the 
bandwidth W WB of the wide-band acoustic signal a W B and that the 
wide-band acoustic signal a W B has a wider spectrum A WB than 
the spectrum A NB of the narrow-band acoustic signal a NB , which 

30 has been transported via a narrow-band channel that has a 
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bandwidth W NB . These relationships are illustrated in the figures 
2-4. Moreover, the bandwidth W WB may be sub-divided into a 
low-band W LB including frequency components between a low- 
most bandwidth limit f W i below a lower bandwidth limit f N | of the 
5 narrow-band channel and the lower bandwidth limit f N i respective 
a high-band W H b including frequency components between an 
upper-most bandwidth limit f Wu above an upper bandwidth limit 
f Nu of the narrow-band channel and the upper bandwidth limit f Nu - 

The proposed signal decoder includes a feature extraction unit 
10 101, an excitation extension unit 105, an up-sampler 102, a 
wide-band envelope estimator 104, a wide-band filter 106, a 
low-pass filter 103, a high-pass filter 107 and an adder 108. The 
feature extraction unit's 101 function will be described in the 
following paragraph, however, the remaining units 102 - 108 will 
15 instead be described with reference to the embodiment of the 
invention shown in figure 5. 

The signal decoder receives a narrow-band acoustic signal a NB , 
either via a communication link (e.g. in PSTN) or from a storage 
medium (e.g. a digital memory). The narrow-band acoustic 

20 signal a N B is fed in parallel to the feature extraction unit 101, the 
excitation extension unit 105 and the up-sampler 102. The 
feature extraction unit 101 generates at least one essential 
feature z NB from the narrow-band acoustic signal a NB . The at 
least one essential feature z NB is used by the following wide- 

25 band envelope estimator 104 to produce a wide-band envelope 
estimation s e . A Gaussian mixture model (GMM) may, for 

instance, be utilised to model the dependencies between the 
narrow-band feature vector z NB and a wide-/high-band feature 
vector z WB . The wide-/high band feature vector z W b contains, for 

30 instance, a description of the spectral envelope and the 
logarithmic energy-ratio between the narrow-band and a wide- 
/high-band. The narrow-band feature vector z NB and the wide- 
/high-band feature vector zwb are combined into a joint feature 
vector z=[z NB , z WB ]. The GMM models a joint probability density 

35 function f z (z) of a random variable feature vector Z, which can 
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be expressed as: 

M 

f 2 (z) = 2> m f z ( z l e m ) 

m=1 

where M represents a total number of mixture components, a m is 
a weight factor for a mixture number m and fz(z|9 m ) is a 
5 multivariate Gaussian distribution, which in turn is described by: 

fz(z I e m ) = ^—-r exp( - \ (z - n zm y C; 1 (z - u. 2m )) 

where |n m represents a mean vector and C m is a covariance 
matrix being collected in the variable 0 m ={pt m) C m } and d 
represents a feature dimension. According to an embodiment of 
10 the invention the feature vector z has 22 dimensions and 
consists of the following components: 

a narrow-band spectral envelope, for instance modelled by 15 
linear frequency cepstral coefficients (LFCCs), i.e. x={x 1f x 15 }, 

a high-band spectral envelope, for instance modelled by 5 linear 
15 frequency cepstral coefficients, i.e. y={yi, ys}, 

an energy-ratio variable g denoting a difference in logarithmic 
energy between the high-band and the narrow-band, i.e. g=yo-x 0 , 
where y 0 is the logarithmic high-band energy and x 0 is the 
logarithmic narrow-band energy, and 

20 a measure representing a degree of voicing r. The degree of 
voicing r may, for instance, be determined by localising a 
maximum of a normalised autocorrelation function within a lag 
range corresponding to 50 - 400 Hz. 

According to an embodiment of the invention, the weight factor 
25 a m and the variable 9 m for m = 1, M are obtained by applying 
the so-called estimate-maximise (EM) algorithm on a training set 
being extracted from the so-called TIMIT-database (TIMIT = 
Texas Instruments / Massachusetts Institute of Technology). 
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The size of the training set is preferably 100 000 non- 
overlapping 20 ms wide-band signal segments. The features z 
are then extracted from the training set and their dependencies 
are modelled by, for instance, a GMM with 32 mixture 
5 components (i.e. M=32). 

Figure 5 shows a block diagram over a signal decoder according 
to an embodiment of the invention. By way of introduction, the 
over all working principle of the decoder is described. Next, the 
operation of the specific units included in the decoder will be 
10 described in further detail. 

The signal decoder receives a narrow-band acoustic signal a NB 
in the form of segments, which each has a particular extension 
in time T f , e.g. 20 ms. Figure 6 illustrates an example narrow- 
band frame format according to an embodiment of the invention, 

15 where a received narrow-band frame n is followed by sub- 
sequent frames n + 1 and n+2. Preferably, adjacent segments 
overlap each other to a specific extent T D , e.g. corresponding to 
10 ms. According to an embodiment of the invention, 15 cepstral 
coefficients x and a degree of voicing r are repeatedly derived 

20 from each incoming narrow-band segment n, n + 1, n+2 etc. 

Then, an estimate of an energy-ratio between the narrow-band 
and a corresponding high-band is derived by a combined usage 
of an asymmetric cost-function and an a-posteriori distribution of 
energy-ratio based on the narrow-band shape (being modelled 

25 by the cepstral coefficients x) and the narrow-band voicing 
parameter (described by the degree of voicing r). The 
asymmetric cost-function penalizes over-estimates of the 
energy-ratio more than under-estimates of the energy-ratio. 
Moreover, a narrow a-posteriori distribution results in less 

30 penalty on the energy-ratio than a broad a-posteriori 
distribution. The energy-ratio estimate, the narrow-band shape x 
and the degree of voicing r together form a new a-posteriori 
distribution of the high-band shape. An MMSE estimate of the 
high-band envelope is also computed on basis of the energy- 
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ratio estimate, the narrow-band shape x and the degree of 
voicing r. Subsequently, the decoder generates ! a modified 
spectral-folded excitation signal for the high-band. This 
excitation is then filtered with the energy-ratio controlled high- 
5 band envelope and added to the narrow-band to form a wide- 
band signal a W B, which is fed out from the decoder. 

The feature extraction unit 101 receives the narrow-band 
acoustic signal a NB and produces in response thereto at least 
one essential feature z NB (r, c) that describes particular 

10 properties of the received narrow-band acoustic signal a NB . The 
degree of voicing r, which represents one such essential feature 
ZNB(r> c), is determined by localising a maximum of a normalised 
autocorrelation function within a lag range corresponding to 50 - 
400 Hz. This means that the degree of voicing r may be 

15 expressed as: 



where s=s(1), s(160) is a narrow-band acoustic segment 
having a duration of T f (e.g. 20ms) being sampled at, for 
instance, 8 kHz. 

20 The spectral envelope c is here represented by LFCCs. Figure 7 
shows a block diagram over a part of the feature extraction unit 
101, which is utilised for determining the spectral envelope c 
according to this embodiment of the invention. 

A segmenting unit 101a separates a segment s of the narrow- 
25 band acoustic signal a NB that has a duration of T f = 20 ms. A 
following windowing unit 101b windows the segment s with a 
window-function w, which may be a Hamming-window. Then, a 
transform unit 101c computes a corresponding spectrum S w by 
means of a fast Fourier transform, i.e. S w = FFT(w-s). The 
30 envelope S E of the spectrum S w of the windowed narrow-band 
acoustic signal a NB is obtained by convolving the spectrum S w 



max 



E^s(n)s(n + x) 
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with a triangular window W T in the frequency domain, which e.g. 
has a bandwidth of 100 Hz, in a following convolution unit 101d. 
Thus, S E = S W *W T . 

A logarithm unit 101e receives the envelope S E and computes a 
5 corresponding logarithmic value S£ 9 according to the 

expression: 
S?=20log 10 (S E ) 

Finally, an inverse transform unit 10lf receives the logarithmic 
value S'° 9 and computes an inverse fast Fourier transform 
10 thereof to represent the LFCCs, i.e.: 

c-IFFT^ 09 ) 

where c is a vector of linear frequency cepstral coefficients. A 
first component c 0 of the vector c constitutes the log energy of 
the narrow-band acoustic segment s. This component c 0 is 
15 further used by a high-band shape reconstruction unit 106a and 
an energy-ratio estimator 104a that will be described below. The 
other components c l5 c 15 in the vector c are used to describe 
the spectral envelope x, i.e. x = [c 1f c 15 ]. 

The energy-ratio estimator 104a, which is included in the wide- 
20 band envelope estimator 104, receives the first component c 0 in 
the vector of linear frequency cepstral coefficients c and 
produces, on basis thereof, plus on basis of the narrow-band 
shape x and the degree of voicing r an estimated energy-ratio g 
between the high-band and the narrow-band. In order to 
25 accomplish this, the energy-ratio estimator 104a uses a 
quadratic cost-function, as is common practice for parameter 
estimation from a conditioned probability function. A standard 
MMSE estimate g MMSE is derived by using the a-posteriori 

distribution of the energy-ratio given the narrow-band shape x 
30 and the degree of voicing r together with the quadratic cost- 
function, i.e.: 
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Qumsb = arg min (g - g) 2 f G)XR (g | x, r)dg 
= E[G|X=x, R=r] 

EL a n 1 f GXR(g»x.'-ie m ) dg 



^ 9 Hi<* k f x R (x,r|e k ) 

m=1 l^^^xR^r I y k) 
= S w m( x ' r )J n 9 f G|x R (9|x,r,e m )dg 

m=1 9 
M 

= Z w m ( x > r )J n gf G (g|e m )dg 

m=1 9 
M 



where in the second last step, the fact is used, that each 
individual mixture component has a diagonal covariance matrix 

10 and, thus 7 independent components. Since an over-estimation of 
the energy-ratio is deemed to result in a sound that is perceived 
as annoying by a human listener, an asymmetric cost-function is 
used instead of a symmetric ditto. Such function is namely 
capable of penalising over-estimates more that under-estimates 

15 of the energy-ratio. Figure 8 shows a graph over an exemplary 
asymmetric cost-function, which thus penalizes over-estimates 
of the energy-ratio. The asymmetric cost-function in figure 8 
may also be expressed as: 

C==bU(g-g) + (g--g) 2 

20 where bU(«) represents a step function with an amplitude b. The 
amplitude b can be regarded as a tuning parameter, which 
provides a possibility to control the degree of penalty for the 
over-estimates. The estimated energy-ratio g can be expressed 

as: 
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g = arg mjn (bll(g - g) + (g _ g) 2 )f G|XR (g | x, r)dg 

The estimated energy-ratio g is found by differentiating the 
right-hand side of the expression above and set it equal to zero. 
Assuming that the order of differentiation and integration may be 
5 interchanged the derivative of the above expression can be 
written as: 

Ew m (x, r) Q (bS(g - g) + 2(g - g))f G (g | 6 m )dg = 0 , 

M M 

5> m (x, r)bf G (g | G m ) + 2g - 2]T w m (x, r)^ = 0 , 

m=1 m=1 

which in turn yields an estimated energy-ratio g as: 
10 g = 2> m (x,i> ym --E w m(x,r)f G (g|G m ) 

The above equation is preferably solved by a numerical method, 
for instance, by means of a grid search. As is apparent from the 
above, the estimated energy-ratio g depends on the shape 
posterior distribution. Consequently, the penalty on the MMSE 
15 estimate g MMSE of the energy-ratio depends on the width of the 

posterior distribution. If the a-posteriori distribution fG|XR(9|x,r) is 
narrow, this means that the MMSE estimate g MMSE is more 

reliable than if the a-posteriori distribution is broad. The width of 
the a-posteriori distribution can thus be seen as a confidence 
20 level indicator. 

Other parameters than LFCCs can be used as alternative 
representations of the narrow-band spectral envelope x. Line 
Spectral Frequencies (LSF), Mel Frequency Spectral 
Coefficients (MFCC), and Linear Prediction Coefficients (LPC) 
25 constitute such alternatives. Furthermore, spectral temporal 
variations can be incorporated into the model either by including 
spectral derivatives in the narrow-band feature vector z NB and/or 
by changing the GMM to a hidden Markov model (HMM). 
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Moreover, a classification approach may instead be used to 
express the confidence level. This means that a classification 
error is exploited to indicate a degree . of certainty for a high- 
band estimate (e.g. with respect to energy y 0 or shape x). 

5 According to an embodiment of the invention, it is presumed that 
the underlying model is GMM. A so-called Bayes classifier can 
then be constructed to classify the narrow-band feature vector 
z NB into one of the mixture components of the GMM. The 
probability that this classification is correct can also be 

10 computed. Said classification is based on the assumption that 
the observed narrow-band feature vector z was generated from 
only one of the mixture components in the GMM. A simple 
scenario of a GMM that models the distribution of a narrow-band 
feature z using two different mixture components si; s 2 (or 

15 states) is shown below. 

fz(z)=f z> s(z > s 1 ) + f 2iS (z,s 2 ) 

Suppose a vector z 0 is observed and the classification finds that 
the vector most likely originates from a realisation of the 
distribution in state Si. Using Bayes rule, the probability 
20 P(S=s 1 |Z=z 0 ) that the classification was correct can be 
computed as: 

P(S=s 1 |Z=z 0 )= limpfs = s 1 |z 0 ~^<Z<z 0 +— ^ 



f 0+ k,s(2|sJdz-P(sJdz 



= lim ; 

A->0 



f + |f Z|S (z Is,)- PCs, ) + f Z|S (z | s 2 ) ■ P(s 2 )dz 

Z °"2 

fzis (z 0 I s i ) • p ( s i ) + f 2|S ( z o I s 2) • P(s 2 ) 



25 The probability of a correct classification can then be regarded 
as a confidence level. It can thus also be used to control the 



WO 02/086867 



16 



PCT/SE02/00485 



energy (or shape) of the bandwidth extended regions W LB and 
W H b of the wide-band acoustic signal awB, such that a relatively 
high energy is allocated to frequency components being 
associated with a confidence level that represents a 
5 comparatively high degree certainty, and a relatively low energy 
is allocated to frequency components if the confidence level 
being associated with a confidence level that represents a 
comparatively low degree certainty. 

The GMM is typically trained by means of an estimate-maximise 
10 (EM) algorithm in order to find the maximum likelihood estimate 
of the unknown, however, fixed parameters of the GMM given 
the observed data. According to an alternative embodiment of 
the invention, the unknown parameters of the GMM are instead 
themselves regarded as stochastic variables. A model uncer- 
15 tainty may also be incorporated by including a distribution of the 
parameters into the standard GMM. Consequently, the GMM 
would be a model of the joint distribution f Z)0 (z,9) of feature 
vectors z and the underlying parameters 9, i.e.: 

M 

f z . e (z.e) = £a m f Z|e (2|e>f e (0) 

m=1 

20 The distribution f Z;0 (z,G) is then used to compute the estimates 
of the high-band parameters. For instance, as will be shown in 
further detail below, the expression for calculating the estimated 
energy-ratio g, when using a proposed asymmetric cost- 
function, is: 

25 g = arg mjn (bU(g - g) + (g - g) 2 )f G|XR (g | x, r)dg 

An incorporation of the model uncertainty for the estimated 
energy-ratio g results in the expression: 

g = arg mjn (bU(g - g) + (g - g) 2 )f G , XR (g | x, r, 9)f e (e)dgd6 
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Whenever the distribution fe(©) and/or the distribution 
fG|XR(g|x,r, 6) are broad, this will be interpreted as an indicator of 
a comparatively low confidence level, which in turn will result in 
a relatively low energy being allocated to the corresponding 
5 frequency components. Otherwise, (i.e. if both distributions f 0 (O) 
and fG|XR(g|x,r, 9) are narrow) it is presumed that the confidence 
level is comparatively high, and therefore, a relatively high 
energy may be allocated to the corresponding frequency compo- 
nents. 

10 Rapid (and undesired) fluctuations of the estimated energy ratio 
g are avoided by means of temporally smoothing the estimated 
energy ratio g into a temporally smoothed energy ratio estimate 
Qsmooth- This can be accomplished by using a combination of a 
current estimation and, for instance, two previous estimations 

15 according to the expression: 

^ smooth 

= 0,5g n + O^g^ + 0,2g n „ 2 

where n represents a current segment number, n-1 a previous 
segment number and n-2 a still earlier segment number. 

A high-band shape estimator 104b is included in the wide-band 
20 envelope estimator 104 in order to create a combination of the 
high-band shape and energy-ratio, which is probable for typical 
acoustic signals, such as speech signals. An estimated high- 
band envelope y is produced by conditioning the estimated 
energy ratio g, the narrow-band shape and the degree of voicing 
25 r in narrow-band acoustic segment s. 

A GMM with diagonal covariance matrices gives an MMSE 
estimate of the high-band shape y MMSE according to the 

expression: 

Ymmse = E[Y|X = x,R = r,G = g] 

30 = a m f XRG( X > r >gl 9 mK m 

m=1 Zn=i a n f XRG(X,r,g|G n ) 
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The excitation extension unit 105 receives the narrow-band 
acoustic signal a NB arid, on basis thereof, produces an extended 
excitation signal E W b- As mentioned earlier, Figure 3 shows an 
example spectrum A NB of an acoustic source signal a S0U rce after 
5 having been passed through a narrow-band channel that has a 
bandwidth W NB . 

Basically, the extended excitation signal E WB is generated by 
means of spectral folding of a corresponding excitation signal 
E NB for the narrow-band acoustic signal a NB around a particular 

10 frequency. In order to ensure a sufficient energy in a frequency 
region closest above the upper band limit f Nu of the narrow-band 
acoustic signal a N B> a part of the narrow-band excitation 
spectrum E NB between a first frequency f-j and a second 
frequency f 2 (where fi<f 2 <fNu) is cut out, e.g - 2kHz and f 2 = 

15 3kHz, and repeatedly up-folded around first f 2 , then 2f 2 -f 1 , 3f 2 -2fi 
etc as many times as is necessary to cover at least the entire 
band up to the upper-most band limit f Wu - Hence, a wide-band 
excitation spectrum E W b is obtained. According to a preferred 
embodiment of the invention, the obtained excitation spectrum 

20 E WB is produced such that it smoothly evolves to a white noise 
spectrum. This namely avoids an overly periodic excitation at 
the higher frequencies of the wide-band excitation spectrum 
E W b- For instance, the transition between the up-folded narrow- 
band excitation spectrum E NB may be set such that at the 

25 frequency f = 6 kHz the noise spectrum dominates totally over 
the periodic spectrum. It is preferable, however not necessary, 
to allocate an amplitude of the wide-band excitation spectrum 
E WB being equal to the mean value of the amplitude of the 
narrow-band excitation spectrum E NB . According to an 

30 embodiment of the invention, the transition frequency depends 
on the confidence level for the higher frequency components, 
such that a comparatively high degree of certainty for these 
components result in a relatively high transition frequency, and 
conversely, a comparatively low degree of certainty for these 

35 components result in a relatively low transition frequency. 
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The high band shape estimator 106a in the wide-band filter 106 
receives the estimated high-band envelope y from the high 
band shape estimator 104b and receives the wide-band 
excitation spectrum E W b from the excitation extension unit 105. 
5 On basis of the received signals y and E W b ? the high band 
shape estimator 106a produces a high-band envelope spectrum 
S Y that is shaped with the estimated high-band envelope y . This 
frequency shaping of the excitation is performed in the 
frequency domain by (i) computing the wide-band excitation 
10 spectrum E W b 00 multiplying the high-band part thereof with a 
spectrum S Y of the estimated high-band envelope y . The high- 
band envelope spectrum S Y is computed as: 

S Y =10 20 

A multiplier 106b receives the high-band envelope spectrum S Y 
15 from the high band shape estimator 106a and receives the 
temporally smoothed energy ratio estimate g smooth from the 

energy ratio estimator 104a. On basis of the received signals S Y 
and g smooth the multiplier 106b generates a high-band energy y 0 . 
The high-band energy y 0 is determined by computing a first 
20 LFCC using only a high-band part of the spectrum between f Nu 
and f Wu (where e.g. f Nu = 3,3 kHz and f Wu = 8,0 kHz). The high- 
band energy y 0 is adjusted such that it satisfies the equation: 

Yo = Qsmooth + C 0 

where c 0 is the energy of the current narrow-band segment 
25 (computed by the feature extraction unit 101) and g smooth is the 

energy ratio estimate (produced by the energy ratio estimator 
104a). 

The high-pass filter 107 receives the high-band energy signal y 0 
from the high-band shape reconstruction unit 106 and produces 
30 in response thereto a high-pass filtered signal HP(y 0 ). 
Preferably, the high-pass filter's 107 cut-off frequency is set to a 
value above the upper bandwidth limit f Nu for the narrow-band 
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acoustic signal a NB , e.g. 3,7 kHz. The stop-band may be set to a 
frequency in proximity of the upper bandwidth limit f Nu for the 
narrow-band acoustic signal a NB , e.g. 3,3 kHz, with an 
attenuation of -60 dB. 

5 the up-sampler 102 receives the narrow-band acoustic signal 
aNB and produces, on basis thereof, an up-sampled signal a N B-u 
that has a sampling rate, which matches the bandwidth W WB of 
the wide-band acoustic signal a W B that is being delivered via the 
signal decoder's output. Provided that the up-sampling involves 

10 a doubling of the sampling frequency, the up-sampling can be 
accomplished simply by means of inserting a zero valued 
sample between each original sample in the narrow-band 
acoustic signal aNB- Of course, any other (non-2) up-sampling 
factor is likewise conceivable. In that case, however, the up- 

15 sampling scheme becomes slightly more complicated. Due to 
the aliasing effect of the up-sampling, the resulting up-sampled 
signal a NB . u must also be low-pass filtered. This is performed in 
the following low-pass filter 103, which delivers a low-pass 
filtered signal LP(a NB - u ) on its output. According to a preferred 

20 embodiment of the invention, the low-pass filter 103 has an 
approximate attenuation of -40 dB of the high-band W HB . 

Finally, the adder 108 receives the low-pass filtered signal 
LP(a NB -u)> receives the high-pass filtered signal HP(y 0 ) and adds 
the received signals together and thus forms the wide-band 
25 acoustic signal a W B» which is delivered on the signal decoder's 
output. 

In order to sum up, a general method of producing a wide-band 
acoustic signal on basis of a narrow-band acoustic signal will 
now be described with reference to a flow diagram in figure 9. 

30 A first step 901 receives a segment of the incoming narrow-band 
acoustic signal. A following step 902, extracts at least one 
essential attribute from the narrow-band acoustic signal, which 
is to form a basis for estimated parameter values of a 
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corresponding wide-band acoustic signal. The wide-band 
acoustic signal includes wide-band frequency components 
outside the spectrum of the narrow-band acoustic signal (i.e. 
either above, below or both). 

5 A step 903 then determines a confidence level for each wide- 
band frequency component. Either a specific confidence level is 
assigned to (or associated with) each wide-band frequency 
component individually, or a particular confidence level refers 
collectively to two or more wide-band frequency components. 

10 Subsequently, a step 904 investigates whether a confidence 
level has been allocated to all wide-band frequency 
components, and if this is the case, the procedure is forwarded 
to a step 909. Otherwise, a following step 905 selects at least 
one new wide-band frequency component and allocates thereto 

15 a relevant confidence level. Then, a step 906 examines if the 
confidence level in question satisfies a condition r h for a 
comparatively high degree of certainty (according to any of the 
above-described methods). If the condition T h is fulfilled, the 
procedure continues to a step 908 in which a relatively high 

20 parameter value is allowed to be allocated to the wide-band 
frequency component(s) and where after the procedure is 
looped back to the step 904. Otherwise, the procedure continues 
to a step 907 in which a relatively low parameter value is 
allowed to be allocated to the wide-band frequency com- 

25 ponent(s) and where after the procedure is looped back to the 
step 904. 

The step 909 finally produces a segment of the wide-band 
acoustic signal, which corresponds to the segment of the narrow 
received that was received in the step 901. 

30 Naturally, all of the process steps, as well as any sub-sequence 
of steps, described with reference to the figure 9 above may be 
carried out by means of a computer program being directly 
loadable into the internal memory of a computer, which includes 
appropriate software for performing the necessary steps when 
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the program is run on a computer. The computer program can 
likewise be recorded onto arbitrary kind of computer readable 
medium. 

The term "comprises/comprising" when used in this specification 
5 is taken to specify the presence of stated features, integers, 
steps or components. However, the term does not preclude the 
presence or addition of one or more additional features, 
integers, steps or components or groups thereof. 



The invention is not restricted to the described embodiments in 
10 the figures, but may be varied freely within the scope of the 
claims. 
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Claims 

1 . A method of producing a wide-band acoustic signal (a WB ) 
based on a narrow-band acoustic signal (a NB ), the spectrum 
(A W b) of the wide-band acoustic signal (a W Es) having a larger 
5 bandwidth than the spectrum (A NB ) of the narrow-band acoustic 
signal (a NB ). the method involving 

extraction of at least one essential attribute (z NB (r, c), E NB ) 
from the narrow-band acoustic signal (a NB ), and 

estimation of a parameter describing aspects of wide-band 
10 frequency components outside the spectrum (A NB ) of the narrow- 
band acoustic signal (a NB ) based on at least one essential 
attribute (z NB (r, c), E NB ), characterised by allocating a 
parameter value to a particular wide-band frequency component 
based on a corresponding confidence level. 

15 2. A method according to claim 1, characterised by 

allocating the parameter value such that 

a relatively high parameter value is allowed to be allocated 

to the frequency component if the confidence level indicates a 

comparatively high degree of certainty, and 
20 a relatively low parameter value is allowed to be allocated 

to the frequency component if the confidence level indicates a 

comparatively low degree of certainty. 

3. A method according to any one of the claims 1 or 2, 
characterised by the parameter value representing a signal 

25 energy. 

4. A method according to any one of the claims 1-3, 
characterised by the spectrum (A WB ) of the wide-band acoustic 
signal (a WB ) comprising 
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a low-band (W LB ) including wide-band frequency compo- 
nents below a lower bandwidth limit (f N |) of the spectrum (A NB ) of 
the narrow-band acoustic signal (a^) » and 

a high-band (W H b) including wide-band frequency compo- 
5 nents above an upper bandwidth limit (f Nu ) of the spectrum (A N b) 
of the narrow-band acoustic signal (a NB ), 

the method involving allocating a confidence level that 
represents a high degree certainty to all frequency components 
in the low-band (W LB ). 

10 5. A method according to any one of the claims 1-4, 
characterised by 

receiving the narrow-band acoustic signal (a NB ) and on 
basis thereof producing an up-sampled signal (a NB - u ) having a 
sampling rate that matches the bandwidth (W W b) of the wide- 
15 band acoustic signal (a W B)» and 

low-pass filtering the up-sampled signal (a NB -u) into a low- 
pass filtered signal (LP(a NB -u))- 

6. A method according to claim 5, characterised by the 
producing of the up-sampled signal (a N B-u) involving insertion of 

20 zero valued samples between samples of the narrow-band 
acoustic signal (a NB ). 

7. A method according to any one of the claims 4-6, 
characterised by involving estimating a wide-band envelope 
(s e ) on basis of at least one essential attribute (z NB (r, c)). 

25 8. A method according to claim 7, characterised by involving 
extending an excitation (E NB ) of the narrow-band acoustic signal 
(a NB ), the extension involving at least one spectral folding of a 
fraction (fi - f 2 ) of an excitation spectrum (E NB ) of the narrow- 
band acoustic signal (aNB)- 
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9. A method according to claim 8, characterised by involving 
wide-band filtering of the extended excitation spectrum (E W b) 
into a wide-band energy signal (y 0 ), the wide-band filtering being 
based on the wide-band envelope estimation (s e ). 

5 10. A method according to claim 9, characterised by involving 
high-pass filtering of the wide-band energy signal (y 0 ) into a 
high-pass filtered signal (HP(y 0 )). 

11. A method according to claim 10, characterised by 

involving receiving the high-pass filtered signal (HP(y 0 )), 
10 receiving the low-pass filtered signal (LP(a NB - u )) and producing 
the wide-band acoustic signal (a W Es) as the sum of the received 
signals. 

12. A method according to any one of the proceeding claims, 
characterised by the at least one essential attribute (z NB (r, c)) 

15 represents a degree of voicing and a spectral envelope (c). 

13. A method according to claim 12, characterised by the 
degree of voicing being determined by a normalised auto- 
correlation function. 

14. A method according to any one of the claims 12 or 13, 
20 characterised by the spectral envelope (c) being represented 

by means of linear frequency cepstral coefficients. 

15. A method according to any one of the claims 12 or 13, 
characterised by the spectral envelope being represented by 
means of line spectral frequencies. 
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16. A method according to any one of the claims 12 or 13, 
characterised by the spectral envelope being represented by 
means of Mel frequency cepstral coefficients. 

17. A method according to any one of the claims 12 or 13, 
5 characterised by the spectral envelope being represented by 

means of linear prediction coefficients. 

18. A method according to any one of the claims 7-17, 
characterised by the estimation of the high-band (W H b) fraction 
of the wide-band envelope (s e ) involving Gaussian mixture 

10 modelling. 

19. A method according to claim 18, characterised by the 
Gaussian mixture modelling involving 

Bayes classification of at least one narrow-band feature 
vector into a mixture component of a Gaussian mixture model, 
15 and 

computation of a value that indicates the probability of that 
the classification is correct. 

20. A method according to claim 18, characterised by the 
Gaussian mixture model representing a joint distribution of 

20 feature vectors and underlying parameters. 

21. A method according to any one of the claims 7-17, 
characterised by the estimation of the high-band (W H b) fraction 
of the wide-band envelope (s e ) involving hidden Markov 

modelling. 

25 22. A computer program directly loadable into the internal 
memory of a computer, comprising software for performing the 
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steps of any of the claims 1-21 when said program is run on the 
computer. 

23. A computer readable medium, having a program recorded 
thereon, where the program is to make a computer perform the 

5 steps of any of the claims 1-21 . 

24. A signal decoder for producing a wide-band acoustic signal 
(a WB ) from a narrow-band acoustic signal (a NB ), the spectrum 
(A WB ) of the wide-band acoustic signal (a W B) having a larger 
bandwidth than the spectrum (A NB ) of the narrow-band acoustic 

10 signal (a NB ), the signal decoder comprising: 

a feature extraction unit (101) receiving the narrow-band 
acoustic signal (a NB ) and on basis thereof producing at least one 
essential attribute (z NB (r, c), E NB ) of the narrow-band acoustic 
signal (a WB ), and 

15 at least one band extension unit (102 - 108) receiving the 

narrow-band acoustic signal (a NB ), receiving the at least one 
essential attribute (z NB (r, c), E NB ) and on basis of the received 
signals producing the wide-band acoustic signal (a W B), 
characterised in that 

20 the signal decoder is arranged to allocate a parameter with 
respect to a particular wide-band frequency component based a 
corresponding confidence level. 

25. A signal decoder according to claim 24, characterised in 
that the signal decoder is arranged to allocate the parameter 

25 such that 

a relatively high parameter value is allowed to be allocated 
to the frequency component if the confidence level indicates a 
comparatively high degree certainty, and 

a relatively low parameter value is allowed to be allocated 
30 to the frequency component if the confidence level indicates a 
comparatively low degree certainty. 
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26- A signal decoder according to claim 24 or 25, 
characterised in that the parameter value represents a signal 
energy. 

27. A signal decoder according to any one of the claims 24-26, 
5 characterised in that it comprises 

an up-sampler (102) receiving the narrow-band acoustic 
signal (a NB ) and on basis thereof producing an up-sampled 
signal (a NB - u ) that has a sampling rate, which matches the 
bandwidth (W W b) of the wide-band acoustic signal (a W B), and 
10 a low-pass filter (103) receiving the up-sampled signal 

(a NB .u) and in response thereto producing a low-pass filtered 
acoustic signal (LP(a NB - u )). 

28. A signal decoder according to any one of the claims 24-27, 
characterised in that it comprises a wide-band envelope 

15 estimator (104) receiving the at least one essential attribute 
(znb(f, c)) and on basis thereof producing an estimated wide- 
band envelope (s e ). 

29. A signal decoder according to claim 28, characterised in 
that the wide-band envelope estimator (104) comprises an 

20 energy ratio estimator (104a) receiving the at least one essential 
attribute (z NB (r, c)) and in response thereto producing an 
estimated energy ratio (g). 

30. A signal decoder according to claim 29, characterised in 
that the wide-band envelope estimator (104) comprises a high- 

25 band shape estimator (104b) receiving the at least one essential 
attribute (z NB (r, c)), receiving the estimated energy ratio (g) and 

on basis of the received signals producing an estimated high- 
band envelope ( y ). 



WO 02/086867 



29 



PCT/SE02/00485 



31 . A signal decoder according to any one of the claims 28-30, 
characterised in that it comprises an excitation extension unit 
(105) receiving the narrow-band acoustic signal (a NB ) and in 
response thereto producing an extended excitation spectrum 

5 (E W b). the extended excitation spectrum (E W b) comprising 
frequency components outside the spectrum (A NB ) of the narrow- 
band acoustic signal (a NB ). 

32. A signal decoder according to claim 31, characterised in 
that it comprises a wide-band filter (106) receiving the extended 

10 excitation spectrum (E WB ), receiving the wide-band envelope 
estimation (s e ) and on basis of the received signals producing a 

wide-band energy signal (y 0 ). 

33. A signal decoder according to claim 32, characterised in 
that the wide-band filter (106) comprises a high-band shape- 

15 reconstruction unit (106a) receiving the extended excitation 
spectrum (E WB ), receiving the estimated high-band envelope (y) 
and on basis of the received signals producing a high-band 
envelope spectrum (S Y ). 

34. A signal decoder according to claim 33, characterised in 
20 that 

the energy ratio estimator (104a) comprises means for 
producing a temporally smoothed energy ratio estimate (g smooth ) 

on basis of the at least one essential attribute (z NB (r, c)), and 

the wide-band filter (106) comprises a multiplier (106b) 
25 receiving the high-band envelope spectrum (S Y ), receiving the 
temporally smoothed energy ratio estimate (g smooth ) and on basis 

of the received signals producing the wide-band energy signal 

(yo). 
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35. A signal decoder according to any one of the claims 31-34, 
characterised in that it comprises a high-pass filter (107) 
receiving the wide-band energy signal (y 0 ) and in response 
thereto producing a high-pass filtered signal (HP(y 0 )). 

5 36. A signal decoder to claim 35, characterised in that it 
comprises an adder (108) receiving the high-pass filtered signal 
(HP(y 0 )), receiving the low-pass filtered signal (LP(a NB - u )) and 
producing the wide-band acoustic signal (a W B) as a sum of the 
received signals. 
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